perm filename CACHE.R[AM,DBL] blob sn#424353 filedate 1979-03-09 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00005 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input paper.tex
C00005 00003	\NSECP{INTRODUCTION}
C00009 00004	\NSECP{THE ASSUMPTIONS}
C00021 00005	\NSECP{THE PROBLEM}
C00028 ENDMK
CāŠ—;
\input paper.tex

\TITL{COGNITIVE  ECONOMY}

\vfill

\AUTHO{Douglas B. Lenat, \ Stanford University}

\AUTHO{Frederick Hayes-Roth, \ Rand Corporation}

\AUTHO{Phillip Klahr, \ Rand Corporation}

\vfill

\vskip 10pt plus 2pt

\NNSECP{ABSTRACT}
       Intelligent systems can explore only tiny subsets of their potential
       external and conceptual worlds.   They must develop efficient  forms
       of  representation,  access,   and  operation   to  increase   their
       capacities.  Some of these  forms involve abstraction, caching,  and
       expectation- simplified processing.  These capabilities in turn  can
       combine to provide extremely powerful increases in performance.  For
       example, all three can combine to simplify simulation or, one of its
       related functions, detection of surprising events.  Our analysis  of
       the economic principles  of modern  AI systems  or (presumably  more
       sophisticated)  human  intelligence  suggests  that  previous  ideas
       regarding cognitive efficiency have erred in fundamental ways.   For
       example, the  nonredundant  storage of  properties  in  hierarchical
       inheritance nets  increases many  processing costs  while  providing
       minimal storage cost  savings.  We  propose methods  to exploit  the
       potential advantages of both schemes.

\vfill

\eject

\NSECP{INTRODUCTION}


Every scientific theory is  constructed in a  rich context of  surrounding
theories, methods,  and standards  which determine  which experiments  are
reasonable ones to perform and which point of view to take when  interpreting
the results  -- in short, a {/it paradigm}. We feel it useful
to articulate the "hard core" of our paradigm (the assumptions our  theory
rests upon) before  presenting the  problem (Section  3) and  principal
ideas for  solution (Sections  4-6). Among our assumptions are: 
a model for intelligence
(Section 2.1),  a  model for  how  intelligent computer  programs  may  be
organized (Section 2.2),  and a  model of the  characteristics of  present
computing engines (Section 2.3).

In highly condensed form, our argument proceeds as follows:
(i) We continually face searches in more or less immense spaces; intelligence is
the ability to bring {/it appropriate} knowledge to bear, to speed up such searching.
Increasing intelligence then comprises increasing knowledge, its
organization, and the conditions for its applicability. 
How are these new discoveries made? Empirical induction
(generalizing from experimental and other observations), analogy, and the criticism
of existing theory all lead to new conjectures, new possibilities.
(ii) Intelligent systems can make the applicability of their knowledge explicit
by representing that knowledge as condition/action rules  (If {/sl situation}
then {/sl appropriate reaction}). Such pattern-directed inference systems (PDIS)
can benefit from a schema representation (frame, unit, being, script, etc.), because
this adds structure which the rules can then take advantage of.
(iii) Computers currently present us with limited cycles, cheap storage,
and expensive knowledge acquisition.
(iv) The problem: We want to develop initial systems quickly, and then have
them speed up and economize their computing, to maximize their potential.
(v) Many AI researchers {/sl cum} language designers have focused on the first
half, resulting in excellent software such as Interlisp's DWIM, File, and 
Break Packages. We therefore call attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
Our principal suggestions for meeting this challenge are:
redundant representation of knowledge at
multiple levels of
abstraction, caching of computed results,
and expectation-simplified computing.


\NSECP{THE ASSUMPTIONS}

\SSEC{Our Model of Intelligence}


Many human cognitive activities, including most of those commonly referred
to as "requiring intelligence", can be cast as searches, as explorations through
a search space, meandering toward some goal.  We call a problem-solver more
intelligent if he can efficiently work toward a solution even in the face of an
immense search space and an ill-defined goal.  
Somehow, he is imposing more structure on the problem, and using that to
search more effectively. How might he do this?  

According to our model of human intelligence, he accomplishes his task by
bringing knowledge to bear, knowledge not supplied explicitly in the problem
statement.  This knowledge can be about problem-solving in
general (e.g., how to recognize and break cultural blocks)
or about the problem's domain specifically
(e.g., which groups of atoms can usually be treated as superatoms). 
It may have been learned long ago, or generated during the
early phase of work on the problem.

This implies that a problem solver can become more effective by
(i) increasing his knowledge, (ii) better organizing his knowledge, and
(iii) refining his criteria for deciding when various pieces of knowledge
are applicable.  In terms of production (If/Then) rules, these correspond to
adding new rules, modifying the data structure in which rules are held,
and modifying the conditions (IF parts) of existing rules.

How is new knowledge discovered? One route is that of {/it abstraction}: condensing
many experiences into a heuristic which, in hindsight, we see would have
greatly aided us in the past, which would have speeded up our reaching this
state in the search. Closely related is {/it generalization}, often merely
expanding the domain of relevance of a specific bit of knowledge we already
possessed. {/it Analogy} is one of the most useful techniques; one can draw
parallels not merely to other existing facts and heuristics, but also to the
{/sl paths} which led to their discovery (e.g., even if a result in organic
chemistry does not map over into the inorganic world, the experiment which
led you to that first fact may map over into an experiment which {/it will}
reveal a useful fact in inorganic chemistry; even though the analogue of a
mathematicl theorem may be false in some new domain, the analogous proof technique
may be useful). Another path to discovery is {/it specialization} of more
general knowledge. Finally, we must mention the process of {/it hypothesis,
criticism, and improved hypothesis}, which is a common and powerful
method of spiralling
in toward precision.  In mathematics [Lakatos] and many
scientific disciplines [Musgrave & Lakatos], counterexamples
to current theories arise frequently, and their incorporation often demands a
deepening of the theory, an increase in knowledge.

Experiments  such as the AM program [ref]  support our assertion that
these methods can adequately
guide even open-ended searches for new definitions and conjectures.
But how can an intelligent system be programmed, how can such systems
be organized?


\SSEC{Our Model of Intelligent Program Organization}

The above remarks about intelligent problem solving apply equally to hardware
and wetware alike. Computer programs which are to be intelligent must
ultimately grapple with the tasks of knowledge acquisition, representation,
and refinement. We cannot provide an abolute answer to how they should be
organized, but we can posit some design constraints which have proven useful
so far.

A very general heuristic in AI programming is the following: If your program
is going to modify its own {\bf X}, then make {\bf X} as separable, clean, and explicit as
possible. In our case, {\bf X} can be instantiated as "knowledge", or as "applicability
conditions for each piece of knowledge". Thus the heuristic advises us to
represent our knowledge in a separate, clean, explicit form, say as knowledge
modules having some fixed structure, and also advises us to keep the applicability
conditions for each knowledge module separate from the rest of the knowledge it
contains. 

This naturally leads us to a pattern-directed inference system, in
which knowledge is broken into separate modules, each with an explicit set of
relevancy tests. Such systems arising in Pittsburgh may overemphasize
cleanliness (so-called pure production systems), while those arising in
California may tend to be a bit too laid back (so-called ad hoc expert systems),
but such variations should not obscure their common source of power.
The dominant PDIS architecture has knowledge broken into a set of condition/action
production, rules of the form IF {/sl triggering situation} THEN {/sl appropriate
reaction}. 

Having a clean representation for rules means having a clean, precise, elegant
language in which to express them.  By structuring the conceptual knowledge
of the system, by partitioning each module's knowledge into several categories,
a rule can condense an entire cumbersome description into a single pointer.
The popular schematized representations of knowledge (scripts for episodes,
frames for static situations, beings for specialists, units for everything)
enable a rule like "If there are no known methods specific to finding new examples
of prime numbers, Then..."
to have its condition coded as a simple 
null-test on the To-get subslot of the Examples slot of 
the schema for Prime Numbers.
By a judicious choice of slot types, the system builder can reduce most
triggering conditions to such quick checks on the state of various (sub)slots
of schemata.

Additional knowledge is required to efficiently locate all the
rules which {/it might} have their conditions satisfied in a given situation,
and also to decide which rules to actually fire (obey) among those whose
IF parts have actually triggered (been satisfied).

The location of potentially-relevant rules is facilitated by organizing the
rules into some structure.  For example, AM organized  mathematical concepts
into a generalization/specialization hierarchy, and tied each rule to its most
relevant concept. To find rules potentially relevant to concept C, AM then
needed only to look at the rules tacked onto C, C's generalizations, their
generalizations, and so on. By inheritability, any rule relevant to
judging Objects in general was (potentially) relevant to judging
an Abelian group.

This requires an explicit focusing of attention, a selection of a "current
topic" C from which the above rippling proceeds. We favor the maintenance
of a separate data structure for this purpose, an agenda of topics to consider,
of subtasks to work on. Knowledge may be brought to bear to select a topic,
then the above quest for potentially relevant rules is carried out, then they
are examined to find the truly relevant satisfied set, and finally some subset
of those are fired off.


\SSEC{Our Model of (Present) Computers}


Since we are considering the problem of building computer models of
intelligent behavior, many of our assumptions deal with the characteristics
of present-day computers. They are the symbol manipulators we use,
but were not designed for that general purpose.

[RICK:  FILL THIS SECTION OUT. BASIC IDEA WAS:
	Computers currently present us with limited cycles, cheap storage,
	uniprocessing, and expensive knowledge acquisition.]


\NSECP{THE PROBLEM}

When we build an AI program, we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral. On the other hand,
we want our systems to run as efficiently as possible, to minimize the
temporal delays, the magnitude of the cpu time required.  Quick and easy
implementing is at odds with careful analyses leading to programs which are
optimized, which have maximized their potential.

For example, Standord's Ray Carhart spent much of 1978 visting Edinburgh,
rewriting Dendral (actually, its primary module, the CONGEN contstrained
generation program) efficiently for a minicomputer.  A similar attempt may be
made soon for the Mycin program.  Typical running times of cpu hours are not
extravagant in AI, even though a carefully optimized version might run in
5% of the time. This is because the researcher, forced to decide between
expressability and economy, chooses the former.  It's better to accomplish
something inefficiently than to fail in an attempt which tried to save a few
hours of computation.


Many AI researchers {/sl cum} language designers have focused on the first
half, 
the problem of rapidly constructing a working experimental vehicle.
They have produced some
excellent software such as Interlisp's DWIM, File, and 
Break Packages[ref]. 


This paper is attempting to draw attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
The ideas exist which enable us to build in more "ultimate efficiency"
(e.g., the ability for a program to automatically increase its efficiency,
albeit at a nontrivial cost during the knowledge acquisition phase)
without risking the failure of the entire project.

Some earlier automatic programming systems [Darlington & Burstall, Lenat's PUP6,
others?] were designed to improve programs' efficiency, and many of those
ideas have found their way into the techniques we now describe.  
Dave Barstow and Elaine Kant have developed a pair of programs which respectively
(i) explicate a choice for representation, control, etc., and (ii) choose,
using a model of the overall structure of the program, the user's intentions,
general analysis of algorithms techniques, and other knowledge.
Earlier, Jim Low built a system [ref] which chose
appropriate data structures based upon the flavor of accesses
that were frequently made to a body of information.
These AP systems
had program construction, transformation, and optimization as their primary
task; we are proposing ways to provide other kinds of AI programs
similar self-improving abilities.


Our principal suggestions for meeting this problem are:
(i) redundant representation of knowledge at
multiple levels of
abstraction, (ii) caching of computed results,
and (iii) using predictions to filter away expected, unsurprising data.